ancestor | An element containing another element (its descendant) anywhere within it. The root element of a document, for example, must be the ancestor of all of the elements in that document, containing all of them. For example, in <A><B><C/></B></A>, the A element is the ancestor of both the B element (which it directly contains) and the C element (which is nested inside the B element). |
application | In SGML terms (and by inheritance, XML), an application is a particular set of vocabulary and structures used by documents. HTML is an application of SGML. MathML and SMIL, two W3C projects, are applications of XML. (Application is also often used to mean 'program' as it does in the rest of computing.) |
attribute | Extra information pertaining to an element that is stored in the start tag of an element (as name="value" pairs) or assigned default values in attribute declarations. Attributes and child elements may sometimes be interchanged; typically, attributes contain information about a particular element, not information that might stand on its own as an extra element. |
attribute type declaration | A declaration describing which attributes may be used with an element and providing attribute names and identifying their types. Optionally, a declaration may provide a set of acceptable values, a default value, or a fixed value for each attribute. |
Cascading Style Sheets (CSS) | A standard providing control over element formatting by annotating the document tree with presentation information. |
case-sensitive | Case-sensitivity means that characters must match exactly, including their case, for a comparison to indicate that two strings are equal. Typically, applications written for the English language transform lower-case and upper-case characters into a single case for searching and much processing, and treat "THIS" and "this" as identical. However, almost everything in XML (except xml:lang attribute values) is case-sensitive, reducing processing and simplifying internationalization. |
CDATA | see character data |
character data (CDATA) | CDATA has two very different meanings in XML. The first meaning is used within document type declarations, where CDATA is used within attribute declarations to indicate that an attribute should contain character content, and that no enumerated set of values is provided to constrain that content. The second meaning applies only within documents, where CDATA marked sections (beginning with <![CDATA[ and ending with ]]>) label text within documents that is purely character data, containing no elements or entities that need to be processed. CDATA sections provide an 'escape' mechanism supporting documents containing characters (typically <, >, and &) that would interfere with normal processing. |
child element | An element nested directly inside another element. In <A><B><C/></B></A>, the B element is the child element of the A element, and the C element is the child of the B element, but the C element is not the child of the A element. |
comment | Used to make a document more readable, although not required. The text of a comment appears between comment tags (e.g., <!-- text of comment -->). Comments can be used throughout a document outside other markup, but they are often found in a DTD or XML schema to explain the purpose of a certain section, to indicate what references mean, etc. |
content | The text and markup contained within an element or an attribute value. |
content model | A set of rules describing the content that may appear inside an element, typically specified using element declarations. (Attribute declarations may provide constraints on content, but their constraints are not normally referred to as a content model.) |
DCD | see Document Content Description for XML |
DDML | see Document Definition Markup Language |
descendant | An element contained by another element, even when other container elements are in between. Every element in an XML document (besides the root element itself) is a descendant of the root element, for example. In <A><B><C/></B></A>, the C element is the descendant of both the B and A elements. At the same time, the B element is the descendant of the A element. |
Document Content Description for XML (DCD) | A schema proposal made to the W3C by Microsoft and IBM. DCD builds on a subset of XML-Data, but uses RDF syntax to describe document structures and data types. |
Document Definition Markup Language (DDML) | A schema proposal created by the XML-Dev mailing list (where it was known as 'XSchema') and submitted to the W3C. DDML's primary goal was the creation of a highly readable and easily processed language for describing document structures. |
document element | Also known as the root element. The single, top-level element that contains all the data in an XML document. It can also include any number of nested subelements and external entities. The arbitrary name you assign to the document element will usually represent the purpose of the document and/or your application's problem domain. |
document type declaration | A declaration that provides a document type definition for a document. The document type declaration may reference an external file (the 'external subset'), include additional declarations (the 'internal subset'), or combine both of those possibilities. The document type declaration also declares the element that will be used as the root element for the document. (Never abbreviate the document type declaration as 'DTD'; that acronym is reserved for the document type definition.) |
document type definition (DTD) | A formal description of the structure of a document that may also provide some content information. DTDs effectively describe XML file formats, providing the vocabulary and allowable structure of the elements in an XML document. The DTD for a document is the combination of the internal and external subsets described by the document type declaration. |
document | In XML, any sequence of text that can be interpreted as well-formed XML. Documents may correspond to 'files,' or they may be generated from databases, or by any other process that results in a stream of text that matches the XML specification. (XML documents contain an optional prolog and miscellaneous comments and processing instructions, followed by a required root element containing the content of the document, which may be followed by white space and more comments and processing instructions.) |
DTD | see Document Type Definition |
EBNF | see Extended Backus-Naur Form |
element type declaration | A declaration in the document type definition which defines an element type by declaring its name and content model. Only one element type declaration is allowed per element name in any DTD. |
element | The unit forming the basic structure of XML documents. Elements may contain attributes (in their start tags), other elements, and textual content. |
empty element | An element without any content. Empty elements may be marked by the combination of a start-tag and end-tag immediately next to each other ( like <EMPTY></EMPTY>) or by a start tag ending with /> (like <EMPTY />) instead of >. Empty elements may contain only attributes - not even white space may appear between the start and end tags, if full start and end tags are used. |
encoding declaration | The encoding declaration specifies the character encoding used for a document. If it appears, it must appear within the XML declaration at the beginning of a document. |
end tag | A tag that marks the end of an element. An end tag uses the syntax </Name>, where Name is the same as the element name that was used in the start tag for the element. |
entity | A reference to data that is used for content reuse and document size minimization. Developers can use entities to simplify the management of information that appears repeatedly. XML Instance refers to entities as reusables. |
enumeration | A list of acceptable values, used in attribute type declarations to limit the possible values for an attribute. |
Extended Backus-Naur Form (EBNF) | A formal notation used to write DTDs and standards like XML 1.0 and Resource Description Framework (RDF). |
Extensible Markup Language (XML) | A markup language defined by the W3C that provides a strict set of standards for document syntax while allowing developers, organizations, and communities to define their own vocabularies. |
Extensible Stylesheet Language (XSL) | A style sheet standard in development at the W3C. XSL uses template rules (written using XML) to transform documents into 'formatting objects' which are then presented on screen, in print, or in other media. |
external DTD subset | The part of a DTD stored outside of the document itself. XML Instance is designed to edit the external DTD subset, which typically holds information used for a set of documents, not just particular to a single document. |
external entity | An entity whose declaration refers to an external resource, rather than including the content represented by the entity in the declaration itself. |
fatal error | Any violation of the well-formedness constraints described in XML 1.0. Parsers must stop processing the document and report an error to the application rather than sending it document content. |
fragment | A portion of an XML document. Fragments are not always well-formed. The W3C XML Activity has formed a working group to describe how to represent fragments and their context, even those which aren't well-formed, without having to transmit XML that isn't well-formed. |
general entity | An entity whose content may be referenced within document content (as opposed to within the DTD). |
id | A value uniquely identifying an element, stored in an attribute. For a document to be valid, each element may only have one id attribute, and every id in the same document must have a unique value. |
idref | An attribute value referring to another element by its id value. For a document to be valid, the value of an idref must point to an id value in the same document. The idrefs attribute type allows an attribute to contain a list of id values, separated by whitespace. |
instance | The use of an element or document type in a document, as opposed to its declaration. The term is also used to refer to documents; a document that conforms to a particular DTD is an 'instance' of that DTD. |
internal DTD subset | The declarations which appear inside the document type declaration of a document, not those referenced in an external file. The internal DTD subset allows document authors to customize DTDs to meet their needs (typically with extra entities), but can also be difficult to manage as different versions of the DTD spread across a large number of documents. |
internal entity | An entity which describes the content it represents within its declaration rather than referring to an external resource. |
markup | Information describing the content of a document. Declarations, tags, processing instructions, and comments are all markup. |
mixed content | A content model for elements that allows text and a limited number of elements to appear in the element. Once an element is declared to have mixed content, the content model can only describe which elements may appear in the content, but can't constrain their order or the number of times they appear. |
name characters | Characters that are letters, digits, hyphens, underscores, colons, or full stops (like periods). Because XML is based on Unicode, the number of characters that meet this criterion is enormous. Although colons are permitted, their use is strongly discouraged except for applications using the W3C's Namespaces in XML recommendation. |
name token | A string composed of any combination of name characters. Name tokens are used in some attribute values to constrain the possible values. (nmtoken is the abbreviation used in DTDs for name token.) |
name | A string beginning with a letter and only containing name characters. Element and attribute names, for example, must be 'names' by this formal description. |
namespace | A tool for uniquely identifying elements and attributes by attaching a prefix to the front of their names and declaring a reference from that prefix to a URI. The URI contains the 'real' name, while the prefix is a convenience. Validation in XML 1.0 will take place using the prefix and not the URI, however. |
nesting | Nesting is the process of embedding one object or construct within another. An XML document can contain nested elements and even other documents. Element nesting establishes parent/child relationships. Every child element resides completely within its parent element. |
notation | A declaration that allows for the convenient 'naming' of file types (and sometimes data types). Notations may be referenced in attributes and in declarations for unparsed entities. |
parameter entity | An entity which describes content that may be used within a DTD. Parameter entities used within the internal subset must contain complete declarations, while those used in the external subset may contain fragments (like element content models or sets of attributes used for multiple elements.) |
parent element | An element directly containing another element. In <A><B><C/></B></A>, the A element is the parent element of the B element, and the B element is the parent of the C element, but the A element is not the parent of the C element. |
parsed character data (#PCDATA) | Parsed character data is the textual content used in elements. The parser will expand entities within that content. |
parser | A tool used to convert a stream of XML information into a set of structures that an application can use. Typically, applications use parser components built by other developers. Parsers generally come in the two flavors of non-validating and validating, though additional features vary within those two groups and some parsers can be used with validation turned on or off. |
processing application | An application that receives information from a parser and does something with it, like edit, display, transform, or retransmit it. |
processing instruction | Directives inside XML documents that float in the structure and which are typically used to pass information about a document that isn't quite structure but isn't quite content either, like references to style sheets. Processing instructions begin with <? and end with ?>. |
processor | The term used in the XML 1.0 specification for the tool more commonly referred to as a parser. |
prolog | The beginning of a document, where the XML declaration, document type declaration, and processing instructions or comments may appear. |
RDF | see Resource Description Framework |
recursion | A programming technique in which a function may call itself. Recursive programming is especially well-suited to parsing nested markup structures. |
Resource Description Framework (RDF) | A standard for storing information about information, commonly referred to as metadata. RDF can (but doesn't have to) use XML syntax, and has its own set of rules for creating schemas. |
reusable | In XML Authority, predefined content for declarations that can be included by reference. (The underlying implementation is a parameter entity.) |
root element | The first element in a document, which must contain all the other elements in the document. |
schema | A set of rules describing a document structure. XML 1.0 Document Type Definitions are one type of schema. The XML-Data, DCD, SOX, and DDML Notes at the W3C represent different proposals for XML schema frameworks. XDR, the XML Data (Reduced) Subset, is implemented in Internet Explorer 5.0. RDF has its own set of schemas separate from those used for XML. |
schema adjunct | an XML document that contains additional application-specific meta-data relative to a particular schema. An XML-processing application can use a schema adjunct for meta-data that is beyond the scope of a schema, rather than embedding the meta-data in the application where it could be difficult to maintain. The Schema Adjunct Framework (SAF) API defines Java interfaces to extract that meta-data as needed. The SAF source code is provided with the SDK along with the compiled Java library file, SAF.jar. |
Schema for Object-oriented XML (SOX) | A schema proposal submitted to the W3C by Veo Systems, Inc., that describes XML document structures in terms familiar to object-oriented software developers. SOX allows developers to use tools commonly found in object-oriented development environments, notably inheritance, to describe document types. |
SGML | see Standard Generalized Markup Language |
sibling | An element sharing the same parent element as another element. In <A><B/><C/></A>, the B and C elements are sibling elements because they are both child elements of the parent A element. |
SOX | see Schema for Object-oriented XML. |
standalone | A declaration made in the XML declaration indicating (by value "yes" or "no") whether a document references external resources through the document type declaration or entity mechanisms. |
Standard Generalized Markup Language (SGML) | The 'parent' language to HTML and XML, itself descended from Generalized Markup Language. SGML is an ISO standard and is referenced formally as ISO 8879:1986, though there have been more recent updates. |
start tag | The opening tag that identifies the beginning of an element within a document and may contain attribute values. Start tags typically use the syntax <elementName attribute="value" > but may alternatively use the syntax <elementName attribute="value" /> if the element has no textual content or sub-elements. |
style sheet | A list of specifications describing how to present a document in a particular medium. Cascading Style Sheets (CSS) and Extensible Stylesheet Language (XSL) are the dominant style sheet mechanisms in the XML space. |
tag | A piece of markup used to indicate element beginnings and endings (sometimes within the same tag). In <A><B/></A>, <A> is the start tag for the element A, </A> is the end tag for the element A, and <B/> is an empty tag representing the element B. |
text | see character data |
UCS-2 | The canonical encoding for Unicode characters, representing characters using the complete 2-byte version of every character. |
Unicode | A standard for international character encoding, maintained by the Unicode Consortium (http://www.unicode.org). XML supports Unicode 2.0 through its reference to ISO 10646, an ISO standard that supports (and extends) the Unicode standard. |
Uniform Resource Identifier (URI) | A resource identifier that may contain either the familiar Uniform Resource Locator or the still nonstandardized Uniform Resource Number. |
Uniform Resource Locator (URL) | A resource identifier that describes its target by presenting a pathway for retrieving it. URL may include a protocol, a host computer, and how to find the target resource on that computer. |
Uniform Resource Number (URN) | A resource identifier that uses a naming scheme to identify resources. Catalogs for resolving this scheme to actual resources have remained rare, however. |
unparsed entity | An entity whose reference points to non-XML information. Unparsed entities are passed to the processing application, not expanded by the XML parser itself. |
URI | see Uniform Resource Identifier |
URL | see Uniform Resource Locator |
URN | see Uniform Resource Number |
UTF-8 | An encoding for Unicode that creates smaller files for documents that use the ASCII character set for most of their characters. Most characters used in English will be represented as a single byte, while characters in other languages will be represented as two or three bytes. |
valid | Documents are valid according to the XML 1.0 Recommendation if they conform to the content model provided for them in a DTD or XML schema and are well-formed. Valid documents can be processed successfully with both validating and non-validating parsers. |
validity constraints | Rules for processing XML documents that only validating parsers are required to enforce. Parsers should report an error to an application when a validity constraint is violated, but are not required to stop processing. |
XHTML (Extensible HyperText Markup Language) | A 26 January 2000 W3C Recommendation in which HTML is expressed as a set of XML modules. (XHTML was previously known as 'Voyager'.) |
XSchema | see Document Definition Markup Language |
W3C | The World Wide Web Consortium, which sets the standards for XML, HTML, HTTP, XSL, CSS, RDF, and a number of other Web-oriented standards. |
well-formed | A document that meets the basic document syntax rules of XML. All elements must be cleanly nested, tags must be properly constructed, and the overall structure of XML documents (in which the XML declaration appears only at the beginning and there is only one root element, for example) is observed. To be an 'XML document,' a document must be well-formed. |
well-formedness constraint | A rule that both validating and non-validating XML parsers must enforce. If a document violates a well-formedness constraint, the parser must halt processing and present the application with a fatal error. |
XLink (XML Linking Language) | A proposal in development at the W3C that describes a set of structures for creating sets of linked XML documents and resources. |
XML Activity | The group of committees at the W3C working on the further development of XML core standards. |
XML declaration | The declaration at the beginning of an XML document that identifies the version and encoding used in the document, as well as whether the document is a 'standalone' document. |
XML | see Extensible Markup Language |
XML-Data | XML-Data is a proposal from Microsoft, Inso, and DataChannel that provides a schema for describing data and data types within XML. A subset of XML-Data is implemented in Microsoft Internet Explorer 5.0. |
xml:lang | An attribute that identifies the language of the element content. All elements contained within an element that has the xml:lang attribute declared will 'inherit' the xml:lang value from the parent value unless they declare their own xml:lang attribute. |
xml:space | An attribute that identifies how whitespace in the element content should be handled. All elements contained within an element that has the xml:space attribute declared will 'inherit' the xml:space value from the parent value unless they declare their own xml:space attribute. |
XPath (XML Path Language) | XPath is a W3C Recommendation that specifies a language for addressing parts of an XML document, designed to be used by both XSLT and XPointer. |
XPointer (XML Pointer Language) | A set of tools for selecting document fragments. XPointer is based on XPath and is under development at the W3C in conjunction with the XLink project. |
XSL | see Extensible Stylesheet Language |
XSLT (XSL Transformations) | XSLT is a W3C Recommendation that specifies a language for transforming XML documents into other XML documents. XSLT is designed for use as part of XSL, which includes an XML vocabulary for specifying formatting. XSL specifies the styling of an XML document by using XSLT to describe how the document is transformed into another XML document that uses the formatting vocabulary. |
Copyright 2000 Extensibility, Inc.
Suite 250, 200 Franklin Street, Chapel Hill, North Carolina 27516